Neural correlates of hierarchical predictive processes in autistic adults

Bayesian theories of autism spectrum disorders (ASD) suggest that atypical predictive mechanisms could underlie the autistic symptomatology, but little is known about their neural correlates. Twenty-six neurotypical (NT) and 26 autistic adults participated in an fMRI study where they performed an associative learning task in a volatile environment. By inverting a model of perceptual inference, we characterized the neural correlates of hierarchically structured predictions and prediction errors in ASD. Behaviorally, the predictive abilities of autistic adults were intact. Neurally, predictions were encoded hierarchically in both NT and ASD participants and biased their percepts. High-level predictions were following activity levels in a set of regions more closely in ASD than NT. Prediction errors yielded activation in shared regions in NT and ASD, but group differences were found in the anterior cingulate cortex and putamen. This study sheds light on the neural specificities of ASD that might underlie atypical predictive processing.

The authors interpret their findings to suggest overall intact neural mechanisms at the lower-level end of the cortical hierarchy in ASD, while neural correlates of predictions and predictions errors at higher levels appeared atypical. The paper is interesting, as it tries to model decision making under uncertainty in autism in a Bayesian context and as it relates ASD to atypical organization of neural hierarchies. I have a several comments and suggestions to further strengthen the authors' conclusions: Introduction: 1) Can the authors provide further details on the inclusion criteria: were these individuals with 'idiopathic' ASD and no genetic (eg Fragile X) or metabolic diagnosis? Please provide further information on the medication taken by the individuals with ASD. Are the main findings consistent when individuals without ADHD, dyslexia, and Tourette are excluded? Methods: 2) The authors convolve with their design matrix with a canonical hemodynamic response function, but their conclusions heavily invoke the notion of cortical hierarchy. Several prior studies have suggested spatially variable HRF properties across the cortex (eg https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5911213/pdf/nihms949811.pdf), also differentiating higher and lower level areas. Moreover, there is prior work in non-human primates and in humans suggesting overall longer intrinsic neural time scales in higher order regions compared to sensory and motor regions (eg the work from Chaudhury et al in Neuron but also more recent fMRI studies in humans eg Ito and Cole 2020 Neuroimage). While I think its good they also perform an ROI based analysus in addition to their whole brain assessment, I would also recommend that the authors comment on this issue in the discussion. Moreover, they are encoraged to present a robustness analysis based on an alternative approach (eg FIR models?) that ensures that conclusions of the current work are not affected by potential variations in the HRF.
3) There appears some degree of selectivity for the choice of regions of interest for the DCM analysis. Aren't there recent extensions of the DCM framework that do in principe allow for whole brain modelling e.g. rDCM (https://pubmed.ncbi.nlm.nih.gov/28259780/)? If one wants to stick to a DCM based on a few set of nodes, one could also use prior data-driven maps of the putative cortical hierarchy to guide or at least support the ROI placement in this work. The principal gradient derived by Margulies et al (2016, PNAS) and/or microstructural differntiation reported by Paquola et al. (2018, PNAS) may come to mind, and/or classic depictions such as those by Mesulam et al (1998, Brain).
Results: 4) Can the authors clarify how they determined the cluster level p<0.05 when findings were not based on FWE correction? 5) To follow up on point 3 above, it might be interesting to corelate findings in Figure 3 (e.g. uncorrected t-maps) with maps of functional/Margulies and microstructural/Paquola hierarchy, in order to assess associations to data-driven models of human cortical hierarchy organization in both groups. Ditto for Figure 4.

Reviewer #2 (Remarks to the Author):
A HGF model is used to assess how ASD and NT individuals encode predictions and prediction errors in the brain for motion discrimination. Importantly, the direction of the motion stimulus is contingent on an auditory cue presented prior to the motion stimulus. This AV contingency changes across trials.
There are a couple of issues: 1. In contrast to the author's previous study, the current study does not generate ambiguous stimuli (e.g. by removing sterodisparity cues). Instead, they treat static stimuli as ambiguous motion. It is not clear that static visual dots are perceived as ambiguous. Furthermore, the dual report paradigm may introduce additional complexities. For instance, Stocker et al. have shown that observers condition their response on their other responses for self-consistency in dual report paradigms. This may need to be explored or accounted for in the model.
2. The authors use a hierarchical HGF model. The model is not precisely described. Moreover, even after reading previous publications, I am left with questions about the model's assumptions (such as different priors for ambiguous and unambiguous stimuli) and update equations. Little justification for model choices is offered. 3. No model or parameter recovery is reported. This is even more important for this particular paper, because the authors do not find any significant differences in behavioural performance or model parameters across groups. Differences only arise for how model parameters are related to brain activations in NT and ASD individuals. Therefore, it is of utmost importance to ensure that parameters can be very precisely recovered. The HGF model includes a large number of parameters (number of parameters is not explicitly stated) and so it may not be that surprising that a subset of parameters predicts activations differently in NT and ASD.
4. The clarity of the paper could be improved. The introduction conflates interpretation and results. For instance, it remains controversial whether forward connections convey prediction errors during perceptual inference.
5. The authors suggest that contingencies influence observers' percept of static dots. Numerous studies have shown representations in MT/V1 associated with observers' percept. Perhaps the authors could perform additional decoding analyses to support their claims, i.e. show that AV contingencies influence the representation decoded from MT.
6. It is unclear why different DCMs are specified for priors and prediction errors. As both priors and prediction errors need to be computed on every trial, I would think both aspects should be integrated in one DCM.

---Responses to the reviewers ---
We are grateful to the reviewers for their careful reading, their encouraging remarks and suggestions that helped us improve the manuscript. Please find below our answers to all questions. Modifications made to the paper appear in between quotation marks and in blue here and are also written in blue in the revised main document.

Neural correlates of hierarchical predictive processes in autistic adults
The authors studied 26 NTs and 26 ASD using task fMRI to assess predictive processing during an audiovisual associative learning task. Behaviorally, there were no marked differences in predictive abilities between groups. Neurally, they observed activations for low level predictions in sensory areas while high level predictions involved transmodal areas in both groups. There were mainly differences between groups in prediction related activity for higher and mid-order predictions.
The authors interpret their findings to suggest overall intact neural mechanisms at the lower-level end of the cortical hierarchy in ASD, while neural correlates of predictions and predictions errors at higher levels appeared atypical. The paper is interesting, as it tries to model decision making under uncertainty in autism in a Bayesian context and as it relates ASD to atypical organization of neural hierarchies. I have a several comments and suggestions to further strengthen the authors' conclusions: Introduction: 1) Can the authors provide further details on the inclusion criteria: were these individuals with 'idiopathic' ASD and no genetic (eg Fragile X) or metabolic diagnosis? Please provide further information on the medication taken by the individuals with ASD. Are the main findings consistent when individuals without ADHD, dyslexia, and Tourette are excluded?
Answer: Indeed, all the autistic participants included in this study had an idiopathic form of ASD, as no genetic or metabolic causes had been identified. This is now specified in the Methods: "Autistic participants received their diagnoses from a multidisciplinary Expertise Centre for Autism […] and had idiopathic ASD." (p. 6).
Moreover, we now provide the whole list of medications reported by the autistic participants:  (1))." (p. 6).
The behavioral and modeling results did not change after excluding the five participants who had comorbidities: -Percentage of correct prediction response above chance level in ASD (75% with n = 21 ASD, 73% with n = 26 ASD, p < .0001 with both the small and large samples), and no significant group difference on the percentage of correct prediction responses (76% in the NT group, group comparison: p = .76).
-Percentage of correct perception response in unambiguous trials not significantly different between groups (99% with n = 21 and n = 26 ASD, 99% in the NT group).
-Percentage of perception responses following the main contingency in ambiguous trials above chance level (67% with n = 21 ASD, 66% with n = 26 ASD, p < .0001), and not significantly different between groups (72% in the NT group, p = .17).
The main findings on the fMRI analyses also did not change when excluding the autistic participants who had comorbidities. First, the global pattern of the activation maps did not change after excluding the five participants who had comorbidities.
Second, the main results of the localizer analysis also did not change after removing the five participants with comorbidities. Indeed, hearing a tone that was predictive of a CW or CCW rotation yielded activity in the CW or CCW mask, respectively, in 70% of the ASD participants (68% in CW mask and 72% in CCW mask with n = 26).
Finally, the contrast estimates extracted in the clusters showing group differences ( Fig. 3 and 4) also revealed that the participants with comorbidities did not influence the results as their estimates were always within a one (or sometimes two) standard deviation(s) from the mean and never appeared to be outliers. After removing the five participants with comorbidities, the t-tests on contrast estimates remained significant in the six clusters where group differences were found (all p-values < .0002).

3/12
We have added a sentence in the Methods section to mention that participants did not appear to have influenced the behavioral nor fMRI analyses: "These participants who had comorbidities are included in the analyses, but note that the results of the behavioural and fMRI analyses did not change after removing these five participants." (p. 6).

Methods:
2) The authors convolve with their design matrix with a canonical hemodynamic response function, but their conclusions heavily invoke the notion of cortical hierarchy. Several prior studies have suggested spatially variable HRF properties across the cortex (eg https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5911213/pdf/nihms949811.pdf), also differentiating higher and lower level areas. Moreover, there is prior work in non-human primates and in humans suggesting overall longer intrinsic neural time scales in higher order regions compared to sensory and motor regions (eg the work from Chaudhury et al in Neuron but also more recent fMRI studies in humans eg Ito and Cole 2020 Neuroimage). While I think its good they also perform an ROI based analysus in addition to their whole brain assessment, I would also recommend that the authors comment on this issue in the discussion. Moreover, they are encoraged to present a robustness analysis based on an alternative approach (eg FIR models?) that ensures that conclusions of the current work are not affected by potential variations in the HRF.

Answer:
We thank the reviewer for the references and appreciate their concern. However, we would like to highlight that the design of our study is not the most optimal to perform FIR analyses. Indeed, conducting FIR analyses can be quite noisy and therefore less powerful. In the study by Ito and Cole, 2020, the authors were indeed able to use a FIR model to process their data, but it should be noted that they had a block design with a TR of 720 ms and a multiband factor of 8, whereas we have an even-related design with a TR of 2 s and a multiband factor of 2.
Even though our design is clearly not optimized to run FIR analyses, we conducted a simple FIR analysis on the two regressors of our main GLM (i.e., Tone and Rotation), by defining 10 time points every 2 s. We specified a FIR model and conducted first and second-level analyses. Based on the main GLM presented in the manuscript, we selected the coordinates of the four clusters associated with the highest T values when ASD and NT groups were pooled together, for the there was no group by ROI interaction, we conclude that our fMRI results were not affected by different temporal patterns of the HRF between groups, but we insist that any conclusion from this FIR analysis must be considered cautiously as our design is not well suited for this kind of analyses.
As advised by the reviewer, we discuss this point in the Discussion and added the references mentioned by the reviewer: "Using a design with shorter TRs could have revealed variability across different regions of the cortex in the temporal shape of the HRF 82 , with faster responses in sensory regions than in associative areas 83 ." (p. 28).
3) There appears some degree of selectivity for the choice of regions of interest for the DCM analysis.
Aren't there recent extensions of the DCM framework that do in principe allow for whole brain modelling e.g. rDCM (https://pubmed.ncbi.nlm.nih.gov/28259780/)? If one wants to stick to a DCM based on a few set of nodes, one could also use prior data-driven maps of the putative cortical hierarchy to guide or at least support the ROI placement in this work. is specified that there are high requirements in terms of data quality, with fast TR and high SNRs, to be able to use rDCM. We therefore believe that our experimental design is not suited to perform rDCM.
However, following the 6 th comment of Reviewer 2, we present a new version of the DCM results (see response below).
Results: 4) Can the authors clarify how they determined the cluster level p<0.05 when findings were not based on FWE correction?
Answer: We were referring to cluster-level extent thresholding (in contrast to peak-level thresholds), as implemented in SPM, where the cluster is measured in units of contiguous voxels.
The cluster-extent based thresholding method relies on Gaussian Random Field methods, implemented in SPM12. We clarified it in the Methods:

5)
To follow up on point 3 above, it might be interesting to corelate findings in Figure 3 (e.g. uncorrected t-maps) with maps of functional/Margulies and microstructural/Paquola hierarchy, in order to assess associations to data-driven models of human cortical hierarchy organization in both groups. Ditto for

Reviewer #2 (Remarks to the Author):
A HGF model is used to assess how ASD and NT individuals encode predictions and prediction errors in the brain for motion discrimination. Importantly, the direction of the motion stimulus is contingent on an auditory cue presented prior to the motion stimulus. This AV contingency changes across trials.
There are a couple of issues: First, the behavioral responses in ambiguous trials show that the percentage of trials perceived according to the current contingency is above chance level in both groups (72% in the NT group and 66% in the ASD group, p < .0001 in both groups). This means that participants indeed had the impression that the two dots were rotating (otherwise, such percentages would not differ from chance level). If we consider individual data, these percentages were below 50% in only 3 out of 52 participants.
Furthermore, we have now added results from the confidence rating task which took place outside of the MRI, after completing the main experiment and. Initially, these data were not included in the manuscript to focus on the fMRI results and to avoid a too long description of the behavioral results, as it served as quality checks. Data showed that ambiguous trials were indeed perceived as more uncertain than unambiguous trials. We added information regarding this task in the Methods (p. 9): "After finishing the fMRI experiment, participants completed a short computer task as in 33 to assess the perceptual quality of the ambiguous trials. The structure of this task was the same as the main task, but trials included a third response screen showing the options "1. Very sure", "2.
Quite sure", "3. Quite unsure", "4. Very unsure" (displayed for 2600 ms When debriefing about the experiment with the participants after the confidence rating task, we asked them whether they had the impression that the dots were not turning. Only two NT participants and five ASD participants reported that they sometimes had the impression that the dots were not turning. But note that we asked this question after the participant completed the confidence rating task, which might have biased their judgment as this task made them wonder about the certainty of their percept. Yet, none of the participants reported spontaneously during or after the experiment that they noticed that the dots were sometimes not turning.
Given our behavioral results from the main task and from the confidence rating task, as well as the feedback from the participants and the results from Weilnhammer et al., 2018, we believe that we succeeded at generating ambiguous percepts.
Regarding the second part of this comment, using dual report paradigm may indeed introduce additional complexities, such as introducing a kind of repetition bias to show more consistency.
We have added it as a limitation in the Discussion (p. 28): "We chose to rely on a dual-report paradigm with both prediction and perception responses to get an explicit measure of prediction learning (where tone could not be simply ignored) and to get an implicit measure of perceptual bias, but using such paradigm might bias responses as participants often tend to ensure self-consistency 81 ." In the associative learning task by Lawson et al. (2017), participants only gave a perception response (but no prediction response). They assessed how priors were learned based on response times and error rates, and showed that these variables were not modulated by the predictability of the stimuli in autistic participants, in contrast with NT. The authors concluded that autistic adults responded inflexibly to expected or unexpected outputs, and had overestimated the volatility of their environment. However, autistic participants might have simply ignored the tones, which were irrelevant to perform the task and whose underlying association with visual outputs was complex and changing. In our task, using a dual-report paradigm was a strength to ensure that tones were not ignored and that participants really tried to learn the cue-outcome association.
2) The authors use a hierarchical HGF model.  Figure 2A and 2B. We hope that adding the mathematical description has clarified the model.

3)
No model or parameter recovery is reported. This is even more important for this particular paper, because the authors do not find any significant differences in behavioural performance or model parameters across groups. Differences only arise for how model parameters are related to brain activations in NT and ASD individuals. Therefore, it is of utmost importance to ensure that parameters can be very precisely recovered. The HGF model includes a large number of parameters (number of parameters is not explicitly stated) and so it may not be that surprising that a subset of parameters predicts activations differently in NT and ASD. To assess the validity of our modeling approach, both in terms of discriminability between models and parameter recovery, we performed simulations.

Answer
First, using Bayesian Model Selection comparing the eight models on the simulated data, we observed that model M5 best explained the data in both groups (protected exceedance probabilities: 1.00). This is in line with our findings, reported in Figure 2. Finally, we would like to note that the model best fitting the data in both groups is not the model with the largest number of parameters among the eight models that we tested, which suggests a 9/12 decreased risk of overfitting. Indeed, the winning model is the one only including associative learning but no priming or sensory memory parameters.
4) The clarity of the paper could be improved. The introduction conflates interpretation and results. For instance, it remains controversial whether forward connections convey prediction errors during perceptual inference.

Answer:
We apologize for the lack of clarity, and we have worked on the manuscript to improve its overall clarity, and in particular on the Introduction to better distinguish interpretations from results. We also specified that the prediction errors are simply hypothesized to be conveyed through forward connections.
5) The authors suggest that contingencies influence observers' percept of static dots. Numerous studies have shown representations in MT/V1 associated with observers' percept. Perhaps the authors could perform additional decoding analyses to support their claims, i.e. show that AV contingencies influence the representation decoded from MT.
Answer: Indeed, it would be relevant to show such activation in MT for ambiguous trials, to assess whether they are perceived as rotating dots. However, the ambiguous trials with static dots only represent 12.5% of the trials (i.e., 9 trials per run), which is too little to be able to run decoding analyses.
Interestingly, in a recent fMRI study, Haarsma and colleagues (2022, BioRxiv) used an associative learning paradigm where a tone was followed by a grating with a certain orientation or simply by noisy patches. They were interested in investigating whether a false percept could be observed, i.e., whether the expected orientation would be perceived in noisy patches when the tone was predictive of that orientation. They were only able to observe it in the middle input layer of V2, and insisted on the fact that this effect could only be observed thanks to the use of a 7T MRI scanner.
However, it should be noted that previous studies successively managed to decode the perceived rotation direction of dots from ambiguous stimuli (e.g., Schmack et al., 2017).
Based on the reviewer's comment, we added this point as a perspective in the Discussion (p. 26): "Future studies including more ambiguous trials could perform decoding analyses (e.g., such as in 71 ) to assess if the perceptual bias induced in ambiguous trials generates activity in motion area MT." As we could not perform decoding analyses on the restricted number of ambiguous trials that we had, we specified another GLM to model ambiguous trials and used an anatomical mask of

10/12
MT/V5 to assess if ambiguous trials were associated with activity in this region. These additional analyses were added in the Supplementary Material, Appendix 3:

"Appendix S3: Activation by ambiguous trials in MT/V5
In order to determine if the presentation of ambiguous trials generated activity in MT/V5, we specified another GLM with Unambiguous rotation and Ambiguous rotation as regressors.
These regressors were coded as events starting at the appearance of the two vertical dots. To account for additional variance, we also included the following regressors: Tone ( We found significant activation in the MT/V5 mask in response to ambiguous trials both in the NT group z = 2,T = 8.7,p < .01,right cluster: x = 48,T = 6.8,p < .01, and in the ASD group z = 2,T = 10.1,p < .01,right cluster: x = 50,z = 2,T = 7.9,p < .01,. Results are illustrated below."

11/12
6) It is unclear why different DCMs are specified for priors and prediction errors. As both priors and prediction errors need to be computed on every trial, I would think both aspects should be integrated in one DCM.
Answer: Indeed, initially we had chosen to perform two DCMs to have a smaller model space and focus on either priors or prediction errors. However, following the reviewer's comments, we have run new DCM analyses which combine both priors and prediction errors, and integrate a larger set of regions, i.e., the bilateral occipital cortices, the left auditory cortex, insula and orbitofrontal cortex.
We adapted the Methods accordingly (pp. 14-16): "The GLM used in the DCM analysis was the same as in the main analysis, but only included the regressors Tone, Rotation, | 3 |, | 2 |, | 3 | and | 2 |.  Figure 5)." We also update the Results (p. 23): "We assessed whether top-down and bottom-up connections were modulated by respectively (Figure 5.A). The BMS (Figure 5.B)  Finally, we made small adjustments in the Discussion, but the main conclusions did not change. Figure 5 was also adjusted to present the new models and results.

REVIEWER COMMENTS</B>
Reviewer #1 (Remarks to the Author): I would like to thank the authors for their thoughtful revisions, and do now recommend the paper for publication.
Reviewer #2 (Remarks to the Author): Thanks for responding so thoroughly to previous comments. The key aspect that requires further attention is parameter recovery. The revised manuscript reports correlations between true and recovered parameters. There are two issues: 1. More methodological details are needed to allow assessment of the parameter recovery (e.g. which parameter range, how many simulations etc.) 2. Because the key finding is a difference in parameter estimates across groups, the following is in addition required: -simulate behavioural responses for each of your ASD and NT individuals using the parameter estimates you obtained from the initial fit of the winning model.
-fit the same model to these simulated responses for each simulated ASD and NT individual -perform the same statistics as for your original data sets -repeat steps 1-3 multiple times and report the fraction of simulations in which the difference across groups is significant as an index for statistical power.

---Responses to the reviewers ---
We are grateful to the reviewers for their suggestions that helped us improve the manuscript. Please find below our answers to the remaining question. We are sorry that it took three months to send this response, the first author is currently in maternity leave. Modifications made to the paper appear in blue here and in the revised manuscript.
 Reviewer #1 (Remarks to the Author): I would like to thank the authors for their thoughtful revisions, and do now recommend the paper for publication.

Answer:
We thank the reviewer for their feedback.

 Reviewer #2 (Remarks to the Author):
Thanks for responding so thoroughly to previous comments.
The key aspect that requires further attention is parameter recovery. The revised manuscript reports correlations between true and recovered parameters. There are two issues: More methodological details are needed to allow assessment of the parameter recovery (e.g. which parameter range, how many simulations etc.) Because the key finding is a difference in parameter estimates across groups, the following is in addition required: -simulate behavioural responses for each of your ASD and NT individuals using the parameter estimates you obtained from the initial fit of the winning model.
-fit the same model to these simulated responses for each simulated ASD and NT individual -perform the same statistics as for your original data sets -repeat steps 1-3 multiple times and report the fraction of simulations in which the difference across groups is significant as an index for statistical power. see e.g. Wilson and Collins (2019